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TITLE OF THE INVENTION 

IDENTIFICATION OF THE CANDIDA ALBICANS 
ESSENTIAL FUNGAL SPECIFIC GENES CaKRES, CaALRI AND 
CaCDC24 AND USE THEREOF IN ANTIFUNGAL DRUG DISCOVERY 

FIELD OF THE INVENTION 

The present invention relates to the identification of 
novel essential fungal specific genes isolated in the yeast pathogen, 
Candida albicans, specifically CaKRES, CaALRI and CaCDC24, and 
particularly to their structural and functional relatedness to their 
Sacharomyces cerevisiae counterparts. More specifically the invention 
relates to the use of CaKRE5, CaALRI and CaCDC24 in fungal diagnosis 
and antifungal drug discovery. 

BACKGROUND OF THE INVENTION 

Opportunistic fungi, including Candida albicans, 
Aspergillus fumigatus, Cryptococcus neoformans, and Pneumocystis 
carinii, are a rapidly emerging class of microbial pathogens, which 
cause systemic fungal infection or "mycosis" in patients whose immune 
system is weakened. Candida spp. rank as the predominant genus of 
fungal pathogens, accounting for approx. 8% of all bloodstream 
infections in hospitals today. Alarmingly, the incidence of 
life-theatening C. albicans infections or "candidiasis" have risen 
sharply over the last two decades, and ironically, the single greatest 
contributing factor to the prevalence of mycosis in hospitals today is 
modern medicine itself. Standard medical practices such as 
organ transplantation, chemotherapy and radiation therapy, suppress 
the immune system and make patients highly susceptible to fungal 
infection. Modern diseases, most notoriously, AIDS, also contribute to 



this growing occurrence of fungal infection. In fact, Pneumocystis carinii 
infection is the number one cause of mortality for AIDS victims. 

Treatment of fungal infection is hampered by the lack 
of safe and effective antifungal drugs. Antimycotic compounds used 
today; namely polyenes (amphotericin B) and azole-based 
derivatives (fluconazole), are of limited efficacy due to the nonspecific 
toxicity of the former and emmerging resistance to the latter. Resistance 
to fluconazole has increased dramatically throughout the decade 
particularly in Candida and Aspergillus spp. 

Clearly, new antimycotic compounds must be 
developed to combat fungal infection and resistance. Part of the solution 
depends on the elucidation of new antifungal drug targets (ie. 
molecules who's chemical inactivation/disruption results in cell death) 
distinct from that of current antifungal drugs which act by inactivating 
membrane/ergosterol composition. The identification of genes 
expressing proteins essential to cell viability in a broad spectrum of 
fungi, and absent in humans, serve as novel antifungal drug targets to 
which rational drug screening can be employed. In this way, drug 
screening can identify specific antifungal compounds that inactivate 
essential and fungal-specific genes, thereby mimicking the validated 
effect of the gene disruption. 

A major advance in the study of pathogenesis and 
antifungal drug development comes from genome sequencing projects 
recently completed for the bakers yeast Sacchammyces cerevisiae and 
recently under way in C. albicans. Although S. cerevisiae is not itself 
pathogenic, it is closely related taxonomically to opportunistic pathogens 
including C. albicans. Consequently, many of the genes identified and 
studied in S. cerevisiae lend valuable insight into the identification and 
functional analysis of homologous genes present in the wealth of 



sequence information provided by the Stanford C. albicans genome 
project (http://candida.stanford.edu), accelerating the isolation of C. 
albicans genes which may participate in the process of pathogenicity 
and cell viability. 

Another dramatic advance from which antifungal drug 
discovery will benefit comes from the S. cerevisiae gene disruption 
consortium, in which the entire genome is being systematically disrupted 
(http: // sequence-www.stanford edu/group/yeastdeletion project / ) 
dentification of all essential genes in this organism will enable strong 
predictions to be made as to which genes in C. albicans are similarly 
essential for cell viability. 

The Bussey laboratory is a prominent contributor to 
the S. cerevisiae functional genomics project and has begun to apply 
this information to identifying potential antifungal drug targets in C. 
albicans (1). We have continued this approach to clone additional genes 
known to be essential for viability in S. cerevisiae and directly test 
whether an identical phenotype is observed in C. albicans. Such genes 
which are found to be essential in C. albicans serve as validated 
antifungal drug targets and provide novel reagents in antifungal drug 
screening programs. 

There thus remains a need to identify essential fungal 
specific genes in Candida albicans and to use such genes in the 
discovery of drugs specifically directed against fungal pathogens. 

The present invention seeks to meet these and other 

needs. 

The present description refers to a number of 
documents, the content of which is herein incorporated by reference. 
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SUMMA RY OF THF INVENTION 

The invention concerns essential fungal specific genes 
in Candida albicans and their use in antifungal drug discovery. 

The present invention further relates to Identification 
and disruption of the Candida albicans fungal specific genes, CaKRE5, 
CaALRI, and CaCDC24 revealing structural and functional relatedness 
to their SaccAja/omyces cerevisiae counterparts, and validates their utility 
in fungal diagnosis and antifungal drug discovery. 

In acccordance with the present invention, full length 
clones of CaKRE5, Ca CDC24 and CaALRI using available fragments of 
C. albicans DNA were isolated by Polymerase Chain Reaction (PCR) to 
amplify genomic DNA derived from C. albicans. The PCR products were 
radiolabeled and used to probe the C. albicans genomic library by colony 
hybridization. DNA sequencing revealed complete open reading frames 
of CaKRE5, Ca CDC24 and CaALRI sharing statistically significant 
homology to their S. Cerevisiae counterparts namely KRE5, CDC24 and 
ALR1 all of which have met several criteria expected for potential 
antifungal drug targets. 

In accordance with the present invention, disruption of 
CaKRE5, CaCDC24 and CaALRI was performed. The disruption 
plasmids were digested and transformed into C. albicans strain CA1. 
Southern blot analysis confirmed that the aforementioned genes are 
essential in C. albicans. 

According to another aspect of the present invention, 
CaKRE5, CaCDC24 and CaALRI were used in antifungal screening 
assays which confirmed their potential to screen for novel antifungal 
compounds. 

While US Patent 5,194,600 claims the use of the 
S. cerevisiae KRE5 gene. A number of observations from fungal biology 



make it far from obvious as to the presence or role of such a gene in 
a pathogenic yeast, and whether it would be essential or otherwise 
have utility as an antifungal target. These observations are listed 
below. 

a) A related gene, GPT1, in the yeast S. pombe is 
not essential, is thought to be involved in protein folding, fails to 
complement the S.cerevisiae kre5 mutant, and fails to reduce 
b-(1,6)-glucan polymer levels in this yeast 

b) The b-(1,6)-glucan polymer could be made in a 
different way in different yeasts. 

c) Genes are lost during evolution and it was not obvious 
that C. albicans retained a KRE5 related gene. For example, the 
CaKRES fails to complement a S. cerevisiae kre5 mutant, thus no gene 
could be recovered by such an approach, similarly the DNA sequence of 
the C. albicans CaKRE5 gene is sufficiently different from that of 
S.cerevisiae, that it cannot be detected by low stringency Southern 
hybridization with the S. cerevisiae KRE5 gene as a probe. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Having thus generally described the invention, reference 
will now be made to the accompanying drawings, showing by way of 
illustration a preferred embodiment thereof, and in which: 

Figure 1 shows CaKRES sequence and comparison to 
the S. cerevisiae KRE5, Drosophila melanogaster UGGT1, and 
S. pombe GPT1 encoded proteins. (A) illustrates nucleotide and 
predicted amino acid sequence of CaKRES, The CaKRES signal peptide 
is underlined in bold. The ER retention sequence His-Asp-Glu-Leu 
(HDEL) is indicated in bold at the C-terminus. Non-canonical CTG 
codons encoding Ser in place of Leu are italicized. (B) shows protein 
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sequence alignment between CaKre5p, Kre5p, Gptlp, and Uggtp. 
Proteins are shown in single-letter amino acid code with amino acid 
identities shaded in black and similarities shaded in gray. Gaps 
introduced to improve alignment are indicated by dashes and amino 
acid positions are shown at the left; 

Figure 2 shows CaALRI sequence and comparison to 
S. cerevisiae Alrlp and Alr2p. (A) illustrates nucleotide and predicted 
amino acid sequence of CaALRI. Two hydrophobic amino acid stretches 
predicted to serve as transmembrane domains are indicated in bold. 
Non-canonical CTG codons are italicized. (B) shows protein sequence 
alignment between CaAIMp, Alrlp, and Alr2p. Proteins are shown in 
single-letter amino acid code with amino acid identities shaded in black 
and similarities shaded in gray. Dashes indicate gaps introduced to 
improve alignment; 

Figure 3 shows CaCDC24 sequence and comparison 
to CDC24 from S. cerevisiae and S. pombe. (A) illustrates nucleotide 
and predicted amino acid sequence of CaCDC24. Non-canonical CTG 
codons are italicized. (B) shows protein sequence alignment between 
CaCdc24p, S. cerevisiae Cdc24p, and the S. pombe homolog, Scdlp. 
The CaCdc24p dbl homology domain extends from amino acids 280-500. 
A pleckstrin homology domain is detected from residues 500-700. 
Protein alignments are formated as described in Fig. 1 and 2; and 

Figure 4 illustrates disruption of CaKRE5, CaALRI, and 
CaCDC24. Restriction maps of (A) CaKRE5, (B) CaALRI, and (C) 
CaCDC24 display restriction sites pertinent to disruption strategies. The 
insertion position of the hisG-URA3-hisG disruption module relative the 
CaKRE5, CaALRI, and CaCDC24 open reading frames (indicated by 
open arrows) is indicated as well as probes used to verify disruptions by 
Southern blot analysis. (D-F.) show southern blot verification of targeted 



integration of the hisG-URA3-hisG disruption module into CaKRE5, 
CaALR1> and CaCDC24 and its precise excision after 5-FOA 
treatment. (D) shows genomic DNA extracted from Candida albicans 
wild-type strain, CAI-4 (lane 1), heterozygote 
CaKRE5/cakre5A::hisG-URA3-hisG (lane 2), heterozygote 
CaKRE5/cakre5A::hisG after 5-FOA treatment (lane 3), and a 
representee transformant resulting from the second round of 
transformation into a CaKRE5/cakre5A::hisG heterozygote (lane 4), were 
digested with Hindlll and analyzed using CaKRE5, hisG, and CaURA3 
probes. Asterisks identify the 1.6 kb ladder fragment that nonspecifically 
hybridizes to the three probes. (E) shows genomic DNA extracted from 
CAM (lane 1), heterozygote CaALR1/caalr1A::hisG-URA3-hisG (lane 2), 
heterozygote CaALR1/caalr1A::hisG after 5-FOA treatment (lane 3), and 
a representee transformant resulting from the second round of 
transformation into a CaALR1/caalr1A::hisG heterozygote (lane 4), 
were digested with EcoRI and analyzed using CaALRI, hisG, and 
CaURA3 probes. (F) shows genomic DNA extracted from CAM (lane 1), 
heterozygote CaCDC24/cacdc24A::hisG-URA3-hisG containing the 
disruption module in orientation 1 (lane 2), heterozygote 
CaCDC24/cacdc24A::hisG-URA3-hisG containing the disruption module 
in orientation 2 (lane 3), heterozygote CaALR1/caalr1D::hisG 
(orientation 1) after 5-FOA treatment (lane 4), heterozygote 
CaALR1/caalr1A::hisG (orientation 2) after 5-FOA treatment (lane 5) and 
a representive transformant resulting from the second round of 
transformation into a CaALR1/caalr1A::hisG (orientation 1) heterozygote 
(lane 6), were digested with EcoRI and analyzed using CaCDC24 t hisG t 
and CaURA3 probes. 

Other objects, advantages and features of the present 
invention will become more apparent upon reading of the following 
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non-restrictive description of preferred embodiments with reference to the 
accompanying drawing which is exemplary and should not be interpreted 
as limiting the scope of the present invention. 

DESCRIPTION OF THE PREFERRED EMBODIMENT 

We have identified C. albicans genes homologous to 
the essential genes KRE5, CDC24, and ALR1 from S. cerevisiae. These 
genes participate in essential cellular functions of cell wall biosythesis, 
polarized growth, and divalent cation transport, respectively. Disruption 
of these genes in C. albicans experimentally demonstrates their 
essential role in this pathogenic yeast. Database searches fail to 
identify clear homologous counterparts in mammalian genomes, 
supporting the utility of these genes as novel antifungal targets. 

KRE5 

The S. cerevisiae KRE5 gene meets several criteria 
expected for a potential antifungal drug target. Deletion of KRE5 confers 
a lethal phenotype (2). Although Kf?E5-deleted cells are known to be 
viable in one particular strain background, they are extremely slow 
growing and spontaneous extragenic suppressors are required to 
propagate kre5D cells under laboratory conditions. Genetic analyses 
suggest tfi^t KRE5, together with a number of additional KRE genes 
participates ir| the in vivo synthesis of P-(1,6)-glucan. p-(1 ,6)-glucan 
covalently cross-links or "glues" other cell surface constituents, namely 
P-(1,3)-glucan, mannan, and chitin into the final wall structure and 
and has been shown to be essential for viability in both S. cerevisiae 
and C. albicans (1,2 and references therein). Moreover, 3-(1,6)-glucan 
has been demonstrated to exist in a number of additional fungal classes 
including other yeast and filamentous Ascomycetes, Basidiomycetes and 



Oomycetes. Importantly, however, efforts have failed to detect 
3-(1,6)-glucan in higher eukaryotes. 

Consistent with a role in P-(1,6)-glucan biosynthesis, in 
vivo levels of this polymer are reduced substantially in kreS-1 cells 
versus an isogenic wild type strain, and are completely absent in several 
independently-suppressed kre5 null strains (2). In addition, kre5 
mutants show a number of genetic interactions with /cre6, another gene 
involved in P-(1,6)-glucan assembly [Shahinian and Bussey, personal 
communication)]. Although the biochemistry of (5-(1,6)-glucan synthesis 
remains poorly understood, recent studies demonstrate that cell wall 
mannoproteins are extensively glucosylated through p-(1 ,6) linkages 
and that this modification plays a central role in their anchorage within 
the extracellular matrix. KRE5 plays a critical role in this process as 
well, as Cwplp, an abundentcell wall protein which is demonstrated 
to be highly glucosylated through p-(1,6)-glucan addition, is undetected 
in the cell wall fraction of kreSD cells, and instead secreted into the 
medium. 

The predicted KRE5 gene product offers only limited 
insight into a possible biochemical activity related to P-(1,6)-glucan 
production. KRE5 encodes a large secretory protein containing both an 
N-terminal signal peptide and C-terminal HDEL retention signal for 
localization to the endoplasmic reticulum. Interestingly, Kre5p has 
limited but significant homology to UDP-glucose:glycoprotein 
glycosy transferases (UGGT), an enzyme class participating in the 
"quality control" of protein folding. Such UGGT enzymes function to 
"flag" misfolded ER proteins by reglucosylation of N-linked GlcNAc2Man9 
core oligosaccharide structures present on misfolded proteins. Proteins 
labelled in this way are substrates for the ER chaperonin, calnexin, 
which facilitates refolding of the misfolded protein. However, genetic 
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analyses to address the relative involvement of KRE5 in 
glucosylation-dependent protein folding and P-(1,6)-glucan biosynthesis 
demonstrate that the essential function of KRE5 is unrelated to protein 
folding, and instead relates to its role in P -(1 ,6)-glucan polymer 
biosynthesis (3). Although it remains to be demonstrated 
biochemically, KRE5 homology to glycosyltransferases likely reflects 
its role in the early biosynthesis of this polymer. 



ALR1 



The product of the S. cerevisiae gene, ALR1, also meets 
several of the conditions necessary for a suitable antifungal drug 
target. Strains deleted of ALR1 show limited growth with 
supplementary Mg +2 but and are otherwise inviable (4). These results 
demonstrate that ALR1 is essential for growth. ALR1 encodes a 922 
amino acid protein containing a highly charged N-terminal domain and 
two hydrophobic C-terminal regions predicted to serve as membrane 
spanning domains anchoring the protein at the plasma membrane. 
Although such a localization remains to be directly demonstrated, 
deposition to the cell surface makes Alrlp an attractive drug target in 
terms of both bioavailability and resistance issues (see Discussion). 
Alrlp shares substantial homology to two additional S. cerevisiae 
proteins, Alr2p (70% identity) and Ykl064p (34% identity). Both Alrlp and 
Alr2p share limited similarity to CorA, a Salmonella typhimurium protein 
periplasmic membrane protein involved in divalent cation transport. 
Mammalian homologues to ALR1 have not been detected despite 
extensive database searches and the gene is absent from the metazoan 
Caenorhabditas elegans. 

Although ALR1 was identified in a screen for genes that 
confer increased tolerance to Al* 8 when overexpressed, biochemical 
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analyses support a role for ALR1 in the uptake system for Mg* 2 and 
possibily other divalant cations. Mg +2 is an essential requirement for 
bacterial and yeast growth. Uptake of radiolabeled Co* 2 , an analog 
of Mg* 2 for uptake assays, correlates with ALR1 activity. 
Overexpression of ALR1 increased Co* 2 uptake four-fold, while deletion 
of ALR1 substantially reduced uptake. As mentioned above, Alrlp shares 
structural and sequence similarity to CorA, an extensively characterized 
Mg +2 import protein and deletion of ALR1 is only suppressed with the 
addition of supplementary Mg +2 

CDC24 

A third potential antifungal drug target is the S. 
cerevisiae gene, CDC24. Accordingly, CDC24 is essential for viability in 
both S. cerevisiae and S. pombe (5). CDC24 has been 
biochemically demonstrated to encode GDP-GTP nucleotide 
exchange factor (GEF) activity towards Cdc42p, a Rac/Rho-type 
GTPase involved in polarization of the actin cytoskeleton. Conditional 
alleles of CDC24 shifted to the nonpermisive temperature lack a 
polarized distibution of actin, and consequentially form large, spherical, 
unbudded cells in which the normal polarized deposition of cell wall 
material is disrupted. Eventually cdc24 mutants lyse at the restrictive 
temperature. CDC24-dependent activation of CDC42, is also required 
for the activation of the pheromone response signal transduction 
pathway during mating, and likely participates in the activation of this 
pathway under conditions that promote pseudohyphal development, since 
a downstream effector of CDC42, STE20, is required for hyphal 
formation. Thus CDC24 regulates cell wall assembly and the 
yeast-hyphal dimorphic transition; both key cellular processes and 
targets being actively pursued in antifungal drug screens. 



Cdc24p localizes to the cell cortex concentrating at 
sites of polarized growth and interacts physically with a number of 
proteins including Cdc42p, Bemlp, and the heterotrimeric G protein 
P and y subunits encoded by STE4 and STEW respectively. Cdc24p 
shares 24% overall identity to its S. pombe counterpart, Scd1 p. Similar 
homology has not been found in mammalian database protein searches, 
although Cdc24p does possess limited homology to a domain of the 
human exchange protein, dbl, and contains a pleckstrin homology 
domain, common to several mammalian protein classes. Unlike this 
limited homology to Cdc24p outside of fungi, Cdc42p conversely shares 
80-85% identity to mammalian isoforms. Perhaps the fungal-specificity 
of CDC24 may be due to its role in the fungal-specific processes of 
bud formation, pseudohyphal growth, and projection formation during 
mating, whereas CDC42 performs highly conserved functions (namely 
actin polymerization and signal transduction) common to all 
eukaryotes. 



Isolation of CaKRE5, CaCDC24, and CaALRI. 

To isolate full length clones of CaKRE5, CaCDC24, 
and CaALRI, oligonucleotides were designed according to publicly 
available fragments of C. albicans DNA sequence. Polymerase chain 
reaction (PCR) using oligonucleotide pairs CAKRE5.1/CAKRE5.2, 
CaCDC24.1/CaCDC24.2, and CaALRI. 1 /CaALRI .2 to amplify genomic 
DNA derived from C. albicans strain SC5314 yielded 574, 299, and 379 
bp products, respectively. These PCR products were 32 P-radiolabeled 
and used to probe a YEp352-based C. albicans genomic library by 
colony hybridization. 



Sequence Information 

DNA sequencing of two independent isolates 
representing putative CaKRES and CaALRI clones revealed 
complete open reading frames sharing statistically significant homology 
to their S. cerevisiae counterparts (Fig. 1, 2). DNA sequencing of multiple 
isolates of CaCDC24 revealed an orf containing strong identity to CDC24, 
but predicted to be truncated at its 3' end. The 3' end of CaCDC24 was 
isolated by PCR amplification using one oligonucleotide designed from 
its most 3* sequence and a second oligonucleotide which anneals to the 
YEp352 polylinker allowing amplification of CaCDC24 C-terminal 
encoding fragments from this C. albicans genomic library. Subcloning 
and DNA sequencing of a 1.0 kb PCR product completes the CaCDC24 
open reading frame and reveals its gene product to share strong 
homology to both Cdc24p and Scdlp (Fig. 3). 

CaKRES 

Sequence analysis reveals CaKRES and KRE5 are 
predicted to encode similarly-sized proteins (1447 vs 1365 amino acids; 
166 vs 156 kDA) sharing significant homology throughout their 
predicted protein sequences (22% identity, 42% similarity, (Fig. 1)). 
Moreover, like KRE5, CaKRES is predicted to possess an amino-terminal 
signal peptide required for translocation into the secretory pathway, 
and a C-terminal HDEL sequence which facilitates the retention of 
soluble secretory proteins within the endoplasmic reticulum (ER). 
Although CaKre5p is more homologous to S.pombe and metazoan UGGT 
proteins throughout its C-terminal domain than to Kre5p, CaKre5p and 
Kre5p, they are more related to each other over their remaining 
sequence (approx. 1100 amino acids). This unique homology 
between the two proteins as well as a similar null phenotypes (see 
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below) suggest that CaKRE5 likely serves as the KRE5 counterpart ii 

C. albicans. 



CaALRI 

CaALRI encodes a 922 amino acid residue protein 
sharing strong identity to both ALR1 (1.0e-180) and ALR2 (1.0e-179, 
(Fig.2)). Like these proteins, CaALRI possesses a C-terminal 
hydrophobic region which likely functions as two transmembrane 
anchoring domains ( ). CaALRI shares only limited homology, however, 
to two highly homologous regions common to ALR1 and ALR2; neither 
the N-terminal 250 amino acids of CaALRI nor its last 50 amino acids 
C-terminal the hydrophobic domain share strong similarity to ALR1 or 
ALR2. In addition, CaALRI possesses two unique sequence extentions 
within the CorA homology region (one 38 a.a. in length, the other, 16 a.a. 
long) not found in either ALR1 or ALR2. Protein database searches 
identify a S.pombe hypothetical protein sharing strong homology to 
CaALRI (2.7e-107), however no similarity to higher eukaryotic proteins 
were detected. 



CaCDC24 

Sequence analysis of the CaCDC24 gene product 
reveals extensive homology to both Cdc24p (3.8e-97) and Scdlp 
(1.0e-59, Fig.3)) throughout their entire open reading frames. Although 
substantial similarity exists between CaCdc24p (and both Cdc24p and 
Scdlp) and a large number of metazoan proteins (upto 1.8e-1 3), in 
each case this homology is restricted to either the nucleotide exchange 
domain, (dbl domain), or a domain common to signal transduction 
components (PH domain). Extensive database searches reveal that 
both the N-terminal (250 a.a.) and C-terminal (300 a.a.) regions of 
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CaCdc24p are exclusively conserved within this fungal family of 
homologs. 



Disruption of CaKRES, CaALRI, and CaCDC24 

5 Experimental strategy 

Disruption of CaKRES was performed using the 
hisG-CaURA3-hisG "L^RA-blaster" cassette constructed by Fonzi and 
Irwin and standard molecular biology techniques (1, and references 
within). A cakre5::hisG-CaURA3-hisG disruption plasmid was 

10 constructed by deleting a 780bp BamH1-Bglll DNA fragment from the 
library plasmid isolate, pCaKRES, and replacing it with a 4.0 kb 
BamHI-Bglll DNA fragment containing the hisG-CaURA3-hisG module 
from pCUB-6. This CaKRES disruption plasmid is deleted of DNA 
sequence encoding amino acids 971-1231, which encompasses 

15 approx. 50% of the UGGT homology domain. This CaKRES disruption 
plasmid was then digested with Sphl prior to transformation. 

A CaALRI disruption allele was constructed by first 
subcloning a 7.0 kp CaALRI BamHI-Sall fragment from YEp352-Iibrary 
isolate pCaALRI into PBSKII+. A 841 bp CaALRI Hindlll-Bglll fragment 

20 was then replaced with a 4.0 kb hisG-CaURA3-hisG DNA fragment 
digested with Hindlll and BamHI from PBSK-hisG-CaURA3-hisG . This 
CaALRI disruption allele, which is lacking DNA sequences encoding 
amino acids 20-299, was digested using BamHI and Sail prior to 
transformation. 

25 A CaCDC24 insertion allele was constructed by first 

deleting a 0.9 kb Kpnl fragment from YEp352-library isolate pCaCDC24 
to remove CaCDC24 upstream sequence containing BamHI and Bglll 
restriction sites which obstruct the insertion of the hisG-Ca(JRA3-hisG 
module. The 4.0 kb BamHI-Bglll hisG-CaURA3-hisG fragment from 
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pCUB-6 was then ligated into a unique Bglll site in P CaCDC24-Kpn1D. 
The resulting plasmid, p cacdc24::hisG-CaURA3-hi S G, possessing an 
insertion allele within CaCDC24 at amino acid position 306, was digested 
with Kpnl and Sail prior to transformation. 

CaKRE5, CaALRI, and CaCDC24 disruption plasmids 
were digested as described above, and transformed into C. albicans 
strain CAW using the lithium acetate method. Transformants were 
selected as Ura + prototrophs on YNB + Casa plates. Heterozygous 
disruptants were identified by PCR (data not shown), verified by 
Southern blot (see below), and prepared for a second round of gene 
disruption by selecting for 5-FOA resistance. To assess the null 
phenotype of each gene, a second round of transformations using 
heterozygous CaKRE5/cakre5, CaALRI Vcaalrl , and CaCDC24/cacdc24 
ura3- strains were performed as outlined above. 

Correct integration of the hisG-CaURA3-hisG module 
into CaKRES, CaALRI, and CaCDC24 and CaURA3 excision from 
heterozygous strains were verified by Southern blot analysis using the 
following probes: 

(1a) a 1.25 kb Xbal-Kpn1 fragment digested from 
pCaKRE5 containing N-terminal coding sequence of CaKRES; 

(1b) a 1.7 kb PCR product containing coding 
sequence from amino acid 404 and 3" flanking sequences of CaALRI; 

(1c) a 778 bp PCR product containing CaCDC24 
coding sequence from amino acids 154-430; 

(2) a 783 bp PCR product which contains the entire 
CaURA3 coding region; 

(3) a 898bp PCR product encompassing the entire 
Salmonella typhimuhum hisG gene. Genomic DMA from 
Ca/C/?E5-disrupted strains were digested with Hindlll and EcoR1 was 
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used to digest genomic DNA from CaALRI and CaCDC24-disrupted 
strains. 



Results 

5 Southern blot analysis revealed that the 

cakre5::hisG-CaURA3-hisG disruption fragment integrated precisely into 
the wild type locus (Fig 4D) after the first round of transformations. 
Both a 5.0 kb wild type band and a 9.0 kb band diagnostic of the 
CaKRE5-disrupted allele were detected using the CaKRE5 probe (Fig 

10 4D). The 9.0 kb band was also detected with both the hisG and CaURA3 
probes, confirming disruption of the first CaKRE5 copy. Successful 
excision of the CaURA3 gene by growth on 5-FOA was validated by 1) 
a predicted shift in size of the CaKRE5 disruption fragment from 9.0 kb 
to 6.0 kb when probed with either CaKRES or hisG probes and 2) the 

1 5 inability of the CaURA3 probe to recognize this fragment and the resulting 
strain having reverted to ura3- prototrophy. 

To determine whether CaKRES is essential, the 
transformation was repeated in two independently-derived 
CaKRE5/cakre5::hisG, ura3-/ura3- heterozygous strains. A total of 36 

20 Ura+ colonies (24 small and 12 large colonies after 3 days of growth) 
were analyzed by PCR using oligonucleotides which amplify a 2.5 kb 
wild-type fragment that spans the BamHI and Bglll sites bordering the 
disrupted region. All colonies were determined to contain this 2.5 kb 
wild-type fragment but lacking the 2.8 kb cakre5::hisG allele, consistent 

25 with the cakre5::hisG-CaURA3-hisG module integrating at the 
disrupted locus. Southern blot analysis using the 3 different probes 
independently confirmed 4 such Ura+ transformants as bonafide 
CaKRE5/cakre5::hisG-CaURA3-hisG heterozygotes. If disruption of both 
copies of the gene were not essential then 50% of the recovered 
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disruptants are expected to integrate into the CaKRE5 locus giving 
homologous disruptants and 50% being heterozygous. For example, 
this is the case when disrupting the second wild-type allele of CaKREI; 
a gene shown not to be essential in S. cerevisiae. An equal number of 
heterozygous and homozygous strains result from this second round of 
transformations (data not shown). However, the absence of any 
homozygous CaKRE5 disrupted transformants being detected among 
the 36 Ura+ transformants analyzed supports our contention that 
CaKRE5 is essential in C. albicans. 

CaALRI 

Southen blot analysis of CaALRI first round 
transformants confirmed correct integration of the 
caalr1::hisG-CaURA3-hisG disruption module as judged by an 
appropriately sized disruption band of 5.7 kb, and a wild-type fragment 
predicted to be >9.0 kb detected by the CaALRI probe (Fig 4E). This 
5.7 kb band was also detected with both the hisG and CaURA3 probes, 
confirming disruption of one copy of CaALRI. Southern blotting 
confirmed excision of the CaURA3 gene by growth on 5-FOA as the 
CaALRI probe detected an expected 5.0 kb fragment due to the 
absense of CaURA3. Moreover, this 5 kb caalr.hisG band was also 
detected using the hisG probe but not with the CaURA3 probe (Fig 
4E). 

Determination of the CaALRI null phenotype was 
performed as described for CaKRE5. However, as it has been reported 
that the inviability of the ALR1 null mutation in S. cerevisiae can be 
partially suppressed by supplementing the medium with MgCI 2 , we 
performed the second transformation by selecting for Ura+ colonies 
on 500mM MgCI 2 -containing medium as well or standard Casa plates. 
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35+ colonies of various size (22 from MgCI 2 -supplemented plates) 
were analyzed by PCR to confirm caalr1::hisG-CaURA3-hisG integration. 
The second allele from each of these 35 transformants was determined 
to be wild-type by PCR using oligos that span the insertion and produce 
a wild-type 1.6 kb product and not the slightly larger 1 .75 kb product of 
the caaln:hisG allele (Note, this was done 2X/run far in 2% agarose/and 
alongside Caa/r:hisG control genomic DNA which did run noticably 
slower than the 35 unknowns). Southern blot analysis using the 3 
different probes independently confirmed 4 such Ura+ transformants as 
CaALR1/caalr1 ::hisG-Ca(JRA3-hisG heterozygotes. Our inability to 
identify a homozygous CaALRI disrupted transformant among the 35 
Ura+ colonies analyzed, supports the claim that CaALRI is essential in 
C. albicans. 

CaCDC24 

Southern blot analysis of CaCDC24 first round 
transformants using the CaCDC24 gene probe confirmed correct 
integration of the cacdc24::hisG-Ca(JRA3-hisG insertion fragment as 
both 2.55 kb and 3.7 kb fragments, diagnostic of the insertional allele, 
were detected in addition to the 2.2 kb wild-type CaCDC24 fragment 
(Fig. 4F). Moreover, both 2.55 kb and 3.7 kb fragments were detected 
using CaURA3 and hisG probes. Excision of CaURA3 from the resulting 
heterozygote was verified by 1) detecting a single 3.3 kb fragment unique 
to 5-FOA resistant colonies using ther CaCDC24 or hisG probes, and 
2) the failure to detect this band using the CaURA3 probe. (Fig. 4F). 

A second round of transformations using the above 
described CaCDC24 heterozygote was performed. 28+ colonies of 
various size were analyzed by PCR to confirm 
cacdc24::hisG-CailRA3-hisG integration. The second allele from each 
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of these 28 transformants was determined to be wild-type by PCR 
using oligos that span the insertion and produce a wild-type 0.5 kb 
product and not the 1 .6 kb product of the caair.hisG allele. Southern 
blot analysis using the 3 different probes independently confirmed 4 such 
Ura+ transformants as CaCDC24/cacdc24::hisG-CaURA3-hisG 
heterzygotes. Our inability to identify a homozygous CaCDC24 disrupted 
transformant among these 28 Ura + colonies analyzed, strongly 
suggests that CaCDC24 is essential in C. albicans like it is known to 
be in S. cerevisiae. 

The present invention is illustrated in further detail by the 
following non-limiting examples. 

EXAMPLE 1 

In vivo Screening Methods for Specific Antifungal Agents 

Candida albicans strains with reduced or elevated levels 
of the CaKRES, CaALRI, or CaCDC24 gene product permit screens 
for differential sensitivity or resistance to a drug or compounds from 
natural or artificial sources that inhibit these proteins. Compounds that 
show such a differential inhibition of growth of such Candida albicans 
strains would be specific inhibitors of CaKRE5, CaALRI, or 
CaCDC24-dependent processes and can be further evaluated as specific 
antifungal drugs. 

Expression of a functional CaKRE5, CaALRI, or 
CaCDC24 maS.cerevisiae kreS, alrt and cdc24 mutant respectively 
allows replacement of the S. cerevisiae gene with that of its C 
albicans counterpart and thus permits screening for specific inhibitors 
in a S. cerevisiae background where the additional experimental 
tractability of the organism permits additional sophistication of the 
screens. For example, drugs which block CaKreSp in S. cerevisiae confer 
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K1 killer toxin resistance, and this phenotype can be used to screen 
for such compounds. Similarly, drugs/compounds could be screened 
which inactivate heterologously-expressed CaCDC24 and consequently 
disrupt its association with Rsrlp or Cdc42p in a two hybrid assay. 
5 Alternatively, CaCDC24 function could be monitored in a screen for 
compounds able to disrupt pseudohyphal formation in a 
CaCDC24-dependent manner A whole cell drug screening assay based 
on CaALRI function could similary be envisaged. For example, 
CaALR f -dependent influx of 57 CQ2+ in a cerevisiae alrl mutant 
10 suppressed by supplementary Mg 2+ could be monitored to identify 
compounds which specifically block the import of divalent cations. 



EXAMPLE II 

In vitro Screening Methods for Specific Antifungal Agents 

15 1. Use of an in vitro assay to synthesize B-M.6Walucan . 

In such an assay the incorporation of labelled 
glucose from UDP-glucose into a product that can be 
immunoprecipitated or immobilized with P-(1,6)-glucan antibodies is 
measured. The specificity of this synthesis can be established by 
20 showing its dependence on CaKreSp, and its digestion with 
3-(1,6)-glucanase. 

Drugs which block this in vitro synthesis reaction, 
block P-(1 ,6)-glucan synthesis and are candidates for antifungal 
drugs, some may inhibit Kre5p, others may inhibit other steps in the 
25 synthesis of this polymer. 

2. Use of a specific in vitr o assay for CaKre5p . 

CaKreSp has amino-acid sequence similarities to 
UDP-glucose glycoprotein glucosyltransferases. The CaKre5p protein 
can be produced heterogeneously or from Candida albicans and an 
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assay devised using a range of substrates that are subset of 
glycoproteins that are in the wail with GPI modifications that are 
p-(1,6)-glucosylated. These acceptor substrates would be obtained 
from a strain of S. cerevisiae that is a kre5 disruption and have failed 
to receive the glucose from the UDP-glucose donor to the acceptor 
substrate in vivo. Such an assay measuring CaKre5p dependent protein 
glycosylation can be used to screen for inhibitors of CaKre5p. 
Alternatively, it would be possible to screen for compounds that bind 
to immobilised CaKreSp. Such inhibitors and Kre5p-binding proteins 
would be candidates for drugs specifically inhibiting this fungal-specific 
process. 

CDC24 has been biochemically demonstrated to 
encode a GDP-GTP nucleotide exchange factor (GEF) required to 
convert Cdc42p to a GTP-bound state. An in vitro assay to 
measure CaCdc24p-dependent activation of Cdc42p could be used to 
screen for inhibitors of CaCDC24p. This could be accomplished by 
directly measuring the percentage of GTP versus GDP bound by 
Cdc42p. Alternatively, Cdc24p function could be determined indirectly 
by measuring Cdc42p-GTP dependent activation of Ste20p kinase 
activity. 

EXAMPLE III 

The use of CaALRI, CaKRE5, and CaCDC24 in PCR-based 
diagnosis of fungal infection 

Polymerase chain reaction (PCR) based assays 
provide a number of advantages over traditional serological testing 
methodologies in diagnosing fungal infection. Issues of epidemiology, 
fungal resistance, relability, sensitivity, speed, and strain identification 
are limited by the spectrum of primers and probes available. The 
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CaKRE5, CaALRI, and CaCDC24 gene sequences enable the design 
of novel primers of potential clinical use. In addition, as CaALRI is 
thought to localize to the plasma membrane and extend out into the 
periplasmic space/cell wall, this extracellular domain could act as a 
serological antigen to which antibodies could be raised and used in 
serological diagnostic assays. 



EXAMPLE IV 

Plasm id-based reporter constructs which measure Kre5p, AIM p, 

or Cdc24p inactivation 

Transcriptional profiling of kre5, alrl, and cdc24 
mutants in S. cerevisiae to identify genes which are transcriptionally 
induced/repressed specifically under conditions of KRE5, ALR1, or 
CDC24 inactivation or overproduction. The identification of promoter 
elements from genes responsive to the loss of KRE5, ALR1, or CDC24 
activity offers practical utility in drug screening assays to identify 
compounds which specifically inactivate these targets. For example, a 
chimeric reporter gene (eg. lacZ, GFP.) whose expression would be 
induced/repressed by such a promoter would reflect activity of Kre5p, 
and could be used for high-throughput screening of compound 
libraries. Further a group of promoters showing such regulated 
expression would allow a specific fingerprint or transcriptional profile to 
be built for the inhibition or overproduction of the ALR1, CDC24, or 
KRE5 genes. This would allow a reporter set to be constructed that 
could be used for high-throughput screening of compound libraries giving 
a specific tool for screening compounds which inhibit these gene 
products. 
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CONCLUSION 

We have identified the CaKRE5, CaALRI, and 
CaCDC24 genes from C. albicans and validated their utility as novel 
antifungal drug targets by demonstrating their essential nature by gene 
disruption. Although the precise function of their gene products 
remains to be determined, we have shown that these proteins are 
essential for viability. Genome database searches fail to detect 
significant homology to these genes in metazoans, suggesting that 
screening for compounds which inactivate these fungal-specific drug 
targets are less likely to display toxicity to human ceils. KRE5 and CDC24 
are unique genes in S. cerevisiae and irrespective of being members 
of gene families in C. albicans, they retain an essential function. Alr1p1 
is part of a 3 member gene family in S. cerevisiae, and sequence 
similarity to Alr2p has been identified (Stanford Sequencing Project), 
however the essential role of CaALRI p in C. albicans and their 
predicted extracellular location offers the potential to screen for novel 
antifungal compounds which need not enter the cell, circumventing 
issues of compound delivery and drug resistance. 

We have shown that the Candida albicans CaKRES 
gene is essential; has a protein product with significant sequence 
similarity to S. cerevisiae Kre5p at the gene product level, and is involved 
in 3-(1,6)-glucan synthesis as there is a reduced amount of the polymer 
in a heterozygous CaKRE5/Cakre5 disruption relative to the 
CaKRE5/CaKRE5 homozygote, and the phenotype of the heterozygous 
CaKRE5/Cakre5 disruption mutant cells resembles that of kre5 deletions 
in S.cerevisiae, clumps of swollen cells with cytokinesis and cell 
separation defects (data not shown). 

Thus, in the present invention we reduce to practice the 
use of CaKRE5, CaALRI, and CaCDC24 in Candida albicans as 
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essential antifungal targets, and extend in a non-obvious way the use of 
these genes to a pathogenic fungal species as targets for screening for 
drugs specifically directed against fungal pathogens. 

Although the present invention has been described 
hereinabove by way of preferred embodiments thereof, it can be modified, 
without departing from the spirit and nature of the subject invention as 
defined in the appended claims. 
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WHAT IS CLAIMED IS: 

1 . An isolated DNA sequence selected from the group 



consisting of: 



a) fungal specific gene of C. albicans termed CaKRES; 

b) fungal specific gene of C. albicans termed CaALRI; 

c) fungal specific gene of C. albicans termed 



CaCDC24; 



yj d) a part or oligonucleotide derived from a), b) or c); 

o 

M= 10 e) a nucleotide sequence complementary to any of the 

51 nucleotide sequences of a) - d); and 

f) a sequence which hybridizes under high stringency 
conditions to any of the nucleotide sequences of a) - e). 



1 5 2. The isolated DNA sequence of claim 1 , wherein said 

sequence of CaKRES is as set forth in Figure 1 A. 

3. The isolated DNA sequence of claim 1 , wherein said 
sequence of CaALRI is as set forth in Figure 2A. 

20 

4. The isolated DNA sequence of claim 1 , wherein said 
sequence of CaCDC24 is as set forth in Figure 3A. 

5. A method of selecting a drug that modulates the 
25 activity of a protein encoded by said CaKRE5 of claim 2 comprising: 

a) incubating a candidate drug with said protein; 

b) determining the activity of said protein in the presence 
of said candidate drug, 
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wherein a potential drug is selected when the activity of said protein in 
the presence of said candidate drug is measurably different than in the 
absence thereof. 

6. A method of selecting a drug that modulates the 
activity of a protein encoded by said CaALRI of claim 3 comprising: 

a) incubating a candidate drug with said protein; 

b) determining the activity of said protein in the 
presence of said candidate drug, 

wherein a potential drug is selected when the activity of said protein in 
the presence of said candidate drug is measurably different than in the 
absence thereof. 

7. A method of selecting a drug that modulates the 
activity of a protein encoded by said CaCDC24 of claim 3 comprising: 

a) incubating a candidate drug with said protein; 

b) determining the activity of said protein in the 
presence of said candidate drug, 

wherein a potential drug is selected when the activity of said protein in 
the presence of said candidate drug is measurably different than in the 
absence thereof. 

8. An isolated nucleic acid molecule consisting of 10 
to 50 nucleotides which specifically hybridizes to RNA or DNA of claim 1 , 
2, 3 or 4, wherein said nucleic acid molecule is or is complementary to a 
nucleotide sequence consisting of at least 10 consecutive nucleotides 
from said nucleic acid sequence set forth in Figures 1 A, 2A or 3A. 
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9. A method of detecting CaKRES, CaALRI or 
CaCDC24 in a sample comprising: 

a) contacting said sample with a nucleic acid molecule 
according to claim 8, under conditions such that hybridization occurs; and 
5 b) detecting the presence of said molecule bound to 

said CaKRES, CaALRI or CaCDC24 nucleic acid. 

10. A purified CaKRES polypeptide or an epitope- 
m bearing portion thereof. 
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11. A purified CaALRI polypeptide or an epitope- 
bearing portion thereof. 



12. A purified CaCDC24 polypeptide or an epitope- 
t£1 15 bearing portion thereof 



13. The purified CaKRES polypeptide according to claim 

10, comprising an amino acid sequence at least 90% identical to the 
amino acid sequence as set forth in Figure 1 B. 

14. The purified CaALRI polypeptide according to claim 

1 1 , comprising an amino acid sequence at least 90% identical to the 
amino acid sequence as set forth in Figure 2B. 



25 



15. The purified CaCDC24 polypeptide according to 
claim 12, comprising an amino acid sequence at least 90% identical to 
the amino acid sequence as set forth in Figure 3B. 
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16. An antibody having specific binding affinity to the 
polypeptide or epitope-bearing portion thereof according to claim 10. 



Lu 
fU 

m 



31 



ABSTRACT OF THE DISCLOSURE 

The invention relates to the identification and disruption 
of essential fungal specific genes isolated in the yeast pathogen Candida 
albicans namely CaKRE5, CaALRI and CaCDC24 and to the use thereof 
in antifungal diagnosis and as essential antifungal targets in a fungal 
species for antifungal drug discovery. 
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Figure 1A. 

1 T<XXIAATGAAAaCAAAaTTTTAQGA 
110 AGACATTCAAAA TGCTCTC ATAGTCAAAAT^^ 
225 CAACA TTTCCTA ATOATATCATTOiTT^^ 
340 ACJU^GAC^TACAQCCAA^TCAcaiGAOC^ 

455 AAOCACC GAAAAAAAATATGTGG A POCOTTG 1 ' I A TrAGnTTA l ' lV.Tl ' ItA. ' TTCTCA AAAGAAACJ^TTAA CGTCTTC 1 A CTA UlTlLi rCJkCACTACCACACAAGTCCrrCAA 

list: gK Fhm Aim Arg Trr Ilm Tyr TyT Thr IU Aim Vel Ale Vml Utt tog Asn Pbm Val Lys Aim Thr Glu Asn Asti Aan Phe Ly* 29 

570 ATC TCA TTT OCA AOG TAT ATC TAC TAC ACC ATT COG CTT OCT GTT TTA TTA AAT TTT CTC AAA OCT ACT GAA AAT AAC AAT TTT AAA 

Leu Glu Val Clu Ala Ser Txp Ser Asn He Asp Phe Leu Pro 3«r Phe He Clu Ala lie val Gly Phe Aan Asp s«r Leu TyT Olu 58 

657 CTT CAA OTP GAA OC<5 TCA TOO ACC AAT ATT CAT TTC CTT OCT ACC TTT ATA GAS OCC ATC CTT GCC TTC AAT GAC TCT TTC TAC GAA 

Cln Thr He Glu Thr He phe Gly Leu Gly Asp Thr Glu Val Glu Leu Glu Asp Asp Ala Ser Asp Gin Glu He Tyr Scr Thr Val 87 

744 CAG ACA ATT GAA ACA ATT TTT GOT TTA OCA GAC ACT GAA GTS GAA TTA GAA GAT CAT OCT TCA GAT CAA GAA ATA TAT TCT ACC CTC 

He Aao Ser Leu Gly Leu Thr Asp Gin Asp Leu Asp Ph* He Aan Ph« Asp Leu Thr Aan Lya Lya Bis Thr Pro Ary He Ala Ala 11( 

831 ATC AAC TCA TTA GCC TTA ACA CAT CAA CAT TTC GAT TTT ATT AAT TTT GAT TTA ACC AAC AAA AAA CAT ACA CCA ACA ATC OCA GCC 

Bis Tyr Asp Bis Tyr Ser Asp Val Leu Thr Lya Phe cly Asp Arg Leu Lys Ser Glu Cys Ala Lya Asp Ser Phe Gly Asa Ala Val 145 

919 CAT TAC GAT CAC TAT TCT GAT CTT CTA ACT AAC TTT OCC GAT CGA CTC AAA ACT GAA TOT OCA AAA GAC TCT TTT CGG AAT CCA CTC 

Glu Thr Lya Asn Gly Cln Ilm Gin Thr Trp Leu Leu Tyr Asd Asp Lys He Tyr Cys Ser Ala Aan Asp Leu Phe Ala Leu Arg Thr 174 
1005 GAA ACG AAA AAT OCT CAA ATT CAA ACQ TOO TTA CTA TAT AAC GAT AAC ATA TAT TCT TCG OCT AAT GAT TTC TTT CCA TTA CCA ACT 

Aap Leu ser Ser Him Ser Thr Leu Leu Phe Asp Arg He He Gly Lya Ser Lys Asp Ala Pro Leu Val He Leu Tyr Gly Ser Pro 203 
1092 GAT TTC ACT TCT CAT TCT ACA CTT TTA TTT CAT ACC ATT ATT CCA AAA TCA AAA CAT CCA OCT TTC GTG ATT TTA TAT GGA ACC CCC 

Thr Clu Glu Leu Thr Lya Asp Phe Leu Lya He Leu Tyr Pro Aap Ala Lya Ala Gly Lya Leu Lya Phe V*I Trp Ary Tyr He Pro 232 
ClI 1179 ACT GAG GAA CTC ACT AAA GAT TTT CTT AAA ATA TTC TAT CCA GAT CCA AAC OCT GGA AAA TTA AAC TTT GTA TOG AGG TAC ATT CCA 

[IT! Leu Gly He Lya Lya Leu Aap Sar II* Smr Gly Tyr Gly Val Ser Leu Lya Her Glu lys Tyr Asp Tyr Smr Gly Ala Clu Gly Asn 261 

M= 1266 CTC CCA ATC AAA AAA CTC CAC TCA ATT TCT OCA TAC COT GTA TCA TPS AAA ATC GAA AAG TAT CAT TAT TCT GOT CCA GAA GGA AAT 

jU I Pro Lys Tyr Aap Leu Ser Arg Asp Phe Thr Arg He Asn Asp Ser Gin Glu Leu Val Leu Val Asn Glu Lya His Ser Tyr Glu Leu 290 

fU 1353 CCA AAG TAT GAT TTC ACT CCA GAT TTC ACC AGA ATT AAT GAC TCG CAA GAC TTO CTC CTG CTC AAT CAA AAA CAT TCG TAT GAA CTT 

fn e ly Val Lya Leu Thr Ser Phe He Leu Ser Aan Arg Tyr Lya Ser Thr Lym Tyr Asp Leu Leu Asp Thr He Leu Thr Aan Phe Pro 319 

| 1440 OCT GTT AAA TTO ACT TCA TTC ATA TTA TCC AAT COT TAC AAG ACT ACT AAA TAT GAC CTT TTA GAT ACC ATT TTA ACC AAC TTT CCC 

~ Lya Phe He Pro Tyr He Ala Arg Leu Pro Lym Leu Lmu Aan Bis Clu Lya Val Lya Ser Lya Val Leu Gly Aan Glu Asp He Gly 348 

1527 AAG TTT ATT OCT TAC ATT CCA CCA TTA CCA AAA TTA CTA AAT CAT CAA AAA GTT AAA TCC AAA CTC CTT CCA AAT CAA CAT ATA CCC 

Leu Ser Gin Asp Ser Tyr Gly He Tyr He Asn Gly Ser Pro He Asn Pro Leu Glu Leu Asp He Tyr Asn Leu Gly Thr Arg He 377 

1614 CTA TCT CAA GAC TCC TAC OCA ATA TAT ATC AAC GOT TCC CCA ATA AAT CCA CTA CAC TTA GAT ATT TAC AAT CTA GOT ACC AGO ATA 
JL_ Lya Glu Glu Leu Gin Thr Val Lya Aap Lmu Val Lya Leu Gly Phe Asp Thr Val Gin Ala Lys Leu Leu He Ala Lym Phe Ala Leu 406 

1701 AAG GAG CAA TTA CAG ACT GTG AAA GAT TTA GTG AAA CTT GGA TTT GAT ACC GTA CAA CCA AAG CTC TTC ATA OCA AAA TTT OCT TTA 

Leu Ser Ala Val Lya Gin Thr cln Phe Arg Asn Gly Asn Thr Leu Kmc Gly Aan Aan Glu Asn Arg Phe Lya Val Tyr Glu Asn Glu 435 
1788 CTT TCA OCT GTT AAA CAA ACA CAA TTT CCA AAT CGG AAT ACA TTA ATC GOT AAC AAT GAA AAT AGA TTT AAA GTG TAT CAA AAT CAA 

Phe Lya Lys Gly Ser Ser Glu Lya Gly Gly Val Leu Phe Phe Aan Asn He Olu Leu Asp Asn Thr Phe Lya Glu Tyr Thr Thr Aap 464 
1875 TTT AAC AAG GOT ACT TCA GAA AAC OCT GOG CTC TTO TTT TTC AAT AAC ATT GAA TTA GAC AAC ACA TTC AAG CAG TAC ACC ACT GAT 

Arg Glu Glu Ala Tyr Leu Gly Val Gly Ser His Lya Leu Lya pro Aan Gin He Pro Leu Lmu Lys Glu Asn He Bta Asp Leu He 493 
1962 COT GAG GAG OCA TAT TTA GGA GTT OCT TCT CAT AAA CTT AAG CCA AAT CAA ATT CCG TTA TTG AAA GAC AAC ATC CAT CAT TTA ATT 

Phe Ala Leu Aan Phe Gly Asn Lya Asa Gin Leu Arg Val Pbe Pbe Thr Leu Ser Lya Vml He Lmu Aap Ser Gly He Pro Cln Gin 522 
2049 TTC OCA TTA AAT TTT GOG AAC AAA AAC CAA TTO COG CTG TTT TTC ACT TTA TCT AAG GTG ATT TTO GAC TCC OCT ATA OCT CAA CAA 

Val Gly Val Leu Pro Val He Oly Aap Aap Pro Mat Asp Leu Leu Leu Ala Clu Lys Pbe Tyr Trp He Ala Clu Lys Ser Ser Thr 551 
2136 GTT GGA GTT TTG CCC CTT ATA GGA CAT GAC CCA ATC GAT CTG TTA CTC OCT GAG AAA TTT TAT TOG ATT OCT GAG AAA TCA ACC ACA 

Ola Clu Ala Leu Ala He Leu Tyr Lys Tyr Phe Glu Smr Aan Ser Pro Asp Glu Val Aap Asp Lmu Leu Aap Lya Val Glu Val Pro 580 
2223 CAA GAG CCA TTA OCA ATA TTG TAT AAA TAT TTT GAA TCA AAC ACT CCA GAT GAA GTT CAT GAC TTA TTA CAT AAA CTG GAA GTA CCC 

Clu Asp Tyr Lys Val Asp Tyr Asn Bis Val Leu Asn Lys Phe Ser He Ser Thr Ala Ser Val He Phe Asn Gly val He Tyr Aap 609 
2310 GAA CAT TAT AAA CTC GAT TAT AAT CAT CTG TTA AAC AAG TTT TCT ATA TCA ACT OCT TCC CTC ATT TTC AAT GOG CTT ATT TAC GAT 

Leu Arg Ala Pro Asn Trp Gin He Ala Met Ser Lys Gin Ha Ser Gin Asp He Ser Leu Zle Lys Thr Pbe Leu Arg Cln Gly Pro 638 
2397 TTA AGA OCA CCA AAC TOO CAG ATT OCA ATC ACT AAA CAA ATA TCC CAG GAC ATT TCA CTT ATT AAA ACT TTC TTO AGA CAG GGA OCA 

He Glu Gly Arg Leu Lys Asp Val Leo Tyr Ser Asn Ala Lya Ser Glu Arg Aan Leu Arg He He Pro Leu Glu Pro Ser Asp He 667 
2484 ATA GAG GOT AGA TTG AAA GAT CTT CTT TAC TCT AAT OCA AAA TCA GAA CCC AAT TTA COT ATA ATT CCA TTA GAA OCT ACT GAC ATT 

II* Tyr Lya Lya He Asp Lys Glu Leu He Asn Asn Ser Zle Ala Phe Lys Lys Leu Asp Lya Ala Gin Cly Val ser Gly Thr Phe 696 
2571 ATT TAC AAG AAA ATC GAC AAG GAA TTA ATA AAC AAT TCA ATT OCA TTC AAO AAC CTA GAT AAA CCC CAC OCT GTG TCT GGA ACA TTT 



ill 

it 



Figure 1 A (continue 



UJ 



Trp Leu V.1 s.r Asp Ph. Thr Lys S«« Al. II. II. Thr Gin L.u II. An> Uu Uo Leu U» Uu Ly. Lye Lys AU 11. Gin II. 

2 <5* TCC CTX CTQ TCC CAT TTT ACC AAC TCA CCA ATA ATT ACT CAA TTC ATA GAT TTO TTA TTO CTT CTC AAA AAG AAA GCX ATT CAS ATA 

^ II. IX. As* Thr Cly Aop Thr Asp V.l Ph. Cly I*. Leu Ly. Thr Ly. eh. Ly. L.u Thr Al. Lou Thr A~ cly cln ,1. Asp 
274S AGA ATT ATT AAT ACT GGC GAT ACA GAT GTT TTT GGA AAA TTO AAA ACA AAC TTT AAA TTA ACC GCC TTA ACA AAT GGA CAA ATT GAT 

du IU II. Clu II. Leu Ly. Ly. Ser Asn XI. S.r Ser AU Asn Asn Asp «« U. ty. Ly. H.c Leu Clu Thr Ly- Gin Leu Pro 
2t32 GAA ATT ATP GAG ATT TTC AAA AAA TCC AAC OCT TCA ACT OCA AAT AAT GAT CAA TTG AAA AAA ATC CTT GAG ACT AAG CAA TTA CCT 

AU Hi- Hi. s«Pb.UuL«u Ph. A-n s.r Ax fl Tyr Pb. Ary Leu A-p Gly Asn Ph. Cly Tyr Glu Clu L*u Asp Gin 11. IU Clu 

29 19 CCT CAT CAC TCT TTT TTO CTA TTC AAC TCT ASA TAT TTT AGA TTO GAT GGA AAT W CCA TAC GAG CAA TTC GAT CAA ATT ATA GAG 

Ph. Glu V.1 Ser Gin AxgL.uA.m-ru II. Pro A-p IU Met Glu AU Tyr Pro Asp Clu Ph. Arg Ser Ly. Ly. VI s.r Asp Ph. 

k TTC AAC TTA ATC CCG GAC ATC ATG GAG CCA TAT"* COG CAT GAG TTT ACC TCC AAG AAC (TTA ACT CAT TTT 

S.r IU V.1 Thr Ly. Ser Ph. Hi. V.1 Asp Clu Ly. 

, CTC ACA AAA TCA TTC CAT GTC GAC GAA AAA 



3006 TTT GAA CTA TCT CAA AGA * 

Asn Leu V.1 Leu S.r Cly Lri Asp Asn Met Asp Trp Ph. Asp L*u V.1 Thr S. 

3093 AAT CTG GTT TTC TCT GGA TTA GAC AAT ATC CAC TCG TPT CAT TTC CTG ACT TCC ATA 

Arg Phe lie V.1 Asp V*l Asn Arg Phe Asp Phe Ser Ser Leu Asp Phe S.r Asn S.r II. Asp v.1 Thr Thr Tyr Glu Glu Asn Ser 

3180 AGG TTT ATT GTT GAT GTT AAC ACC TPT CAT TTT AGC TCA TTC CAT TTT TCA AAC TCC ATT CAT GTA ACG ACT TAT GAA GAA AAT AGT 
Pro V.1 Asp Val Leu II. lU Leu Asn Pro Met Asp Clu Tyr Ser Gin Ly» Leu lie S.r L.U V.1 Asn S.r IU Thr Asp Phe Ueu 

3«7 CCA GTT GAT GTA TTA ATA ATT TTG AAC CCT ATG GAT GAA TAT TCT CAA AAA TTC ATA AGC CTT GTT AAT AGC ATT ACA CAT TTT CTG 
Ph. L.u Asn 11. Ara IU L« Gin Pro Arg v.1 Asp L.u Ly- Glu Glu IU Ly. IU Ki. Ly. Ph. Tyr » cly v.1 Tyr Pro 

3354 TTC TTG AAC ATT AGA ATC TTA CTA CAA CCA AGA GTG GAT CTG AAA GAA GAG ATC AAA ATT CAC AAG TTT TAT OCT GOT CTG TAT CCT 
Gin Pro Thr Pro Lys Ph. Asp s.r Asn cly Lys Trp IU Gin Hi. Tyr Ser Al. Gin Ph. Glu Ser IU Pro Ser Asn v.1 Thr Tyr 

3441 CAA COS ACT CCC AAA TTT GAT TCC AAT OGC AAG TCC ATC CAA CAT TAT TCA OCT CAA TOT GAA ACT ATT OCA TCC AAT CTG ACC TAT 

ser Thr Clu Leu Asp V.1 Pro Hi. Ly. Trp 11. v.1 V.1 Pro Gin Leu Ser S.r Mec Asp Leu Asn Thr IU Asn Phe Ser Glu Ser 
3529 TCT ACT GAA TO CAT CTT CCA CAT AAG TGG ATA GTP GTT CCT CAA CTG ACT TCC ATG GAT TTA AAC ACA ATC AAT TTC AGC GAA AGC 

Hi. ser v.l Asp Al. Ly. Tyr s.r Leu Ly. Asn II. Leu IU Glu cly Tyr Al. Arc Asp 11. Hi. Thr cly Ly. Al. Pro Asp Gly 
3S15 CAC TCT CTT CAT CCA AAA TAC TCT CTA AAA AAT ATA TTA ATT CAA OGA TAT CCT AGA GAT ATT CAT ACT GGC AAG CCC CCT CAT GGT 

Leu 11. Phe Arg Al. Phe Asn Lys Asn Tyr Ser Thr Asp Thr Leu v.1 Met Thr ser L.u Asp Tyr Phe cln II. Lys Al. Tyr Pro 
3702 TTA ATC TTT AGA GCC TTO AAT AAA AAT TAC TCA ACT GAT ACT TTG CTC ATG ACT TCC TTC CAC TAT TTP CAA ATC AAA CCG TAT CCT 

Ser 11. Ph. Asn Phe S.r Thr Thr Ser Asn Asp Thr Leu Leu Ser Al. Ser Glu Asn Ly. Tyr Gin Al. Asn Thr Clu Glu Leu Glu 
3789 ACT ATT TTC AAC TTT ACT ACG ACC TCA AAT GAC ACA TTA TTC TCT CCA TCC CAA AAC AAA TAT CAG CCT AAT ACC GAG GAA TTG GAG 

S.r IU Glu V.l Pro V.1 Ph. Ly. II. Asp Cly Ser Thr IU Tyr Pro Arg V.1 He* Ly. Ser Gly Asn Asn Lys Pro Met Leu Thr 
3876 AGC ATP GAG GTG CCA GTT TTT AAA ATT GAT GGA TOG ACC ATA TAT CCA AGC CTA ATC AAA TCT GCC AAC AAT AAG CCA ATG CTG ACG 

Arg Ly. Hi- Al. A-p lU Asn IU Ph. Thr lie Al. S.r Cly Gin Leu Tyr Clu Ly. Leu Thr S.r IU Me, IU Al. s.r V.1 Arg 



3943 AGA AAA CAT CCA CAT ATA AAT ATT TTT ACA ATT CCT ACT GGC CAA CTT TAT < 



i TTA ACT AGC ATT ATC ATT COG TCA CTA AGA 



I*. Hi. Asn Pro Ser Leu Thr IU Ly. Ph. Trp II. Lou Glu Asp Ph. V.l Thr Pro Cln Phe Ly. His Leu V.l Clu L.U Xl. Ser 
4050 AAA CAT AAC OCT AGC CTG ACA ATA AAA TTC TCC ATT TTG GAA GAT TTT CTG ACC CCA CAA TTC AAA CAC TTC CTA CAG CTT ATC TCA 
II. Lys Tyr Asn v.l Clu Tyr Glu Ph. II. S.r Tyr Lys Trp Pro Asn Phe Leu Arg Ly. Gin Lys Thr Lys Glu Arg Met lie Trp 
4137 ATA AAC TAT AAT CTC CAA TAT GAG TTT ATT ACT TAC AAA TGG CCC AAT TTC TTC AGA AAA CAC AAA ACC AAA CAA AGA ATG ATO TGG 
Gly Tyr Lys II. Leu Phe Leu A-p V.l Leu Ph. Pro Gin Asp Lou Asn Ly. XI. IU Ph. He Asp Al. A-P Cln Xl. Cy. Ax* Al. 
4224 GGC TAT AAO ATT TPG TTT TTG GAC GTP TTG TOC CCA CAA GAT CTC AAC AAG ATT ATA TTC ATT CAC CCC CAT CAA ATA TGT ACC CCA 
Asp L.U Thr Glu L-u V.l Asn Met Asp Lou Glu Cly Al. Pro Tyr Gly Ph. Thr Pro Met Cy. Asp Ser Arg Clu Glu Met Glu Gly 
4311 GAT TTG ACA GAA TTC CTT AAC ATG GAT CTT GAA CCT CCT CCA TAT GGA TOT ACT CCA ATG TCT GAT TCT CCC CAA CAA ATC GAA OCT 
Ph. Arg Phe Trp Ly. Clu Cly Tyr Trp Ser Asp V.l Leu Ly. Asp Asp I*u Ly. Tyr Hi- lie Ser Al. Lou Ph. V.l V.l Asp Leu 
439S TIC AGA TTT TGG AAA CAA CCA TAC TGG TCC CAT GTT TTG AAC CAT GAT TOG AAA TAT CAT ATT ACT CCA TTA TOT CTT GTT GAT TOG 
Cln Ly. Ph. Arg S.r IU Ly. Al. cly A— p Arg L*u Arg Al. Hi. Tyr cln Ly. L.u S.r S.x Asp Pro Asn Ser L.u Ser A-n Lou 
4485 CAA AAG TTC AGA TCT ATA AAA OCT GGA GAC ACA TPG AGA OCA CAC TAT CAA AAC CTO TCT ACT GAT CCA AAT TOG TTG ACC AAT TTA 
Asp Gin Asp L.u Pro Asn Asn M.t cln Arg Leu 11. Ly. IU Ph. s.r L« Pro cln A« Trp L.u Trp cys clu Thr Trp cys s.r 

4S72 GAT CAA CAT TTO OCC AAT AAT ATC CAA AGA CTC ATA AAA ATT TTC ACT ^TC CCT CAA AAT TCC CTC TGG TGT CAA ACC TGG TCC TCA 



Asp Lys Ser Leu Glu Asp AU Lys Hot IU Asp Leu Cys Asn Asn Pro 



Leu Thr Arg clu Asn Lys Leu Asp Al. Al. Ly. Arg L*u 



4659 CAT AAA AGC TOO GAA GAT CCA AAA ATG ATT CAT CTT TOO AAC AAT CCA TTA ACT AGA 



GAA AAT AAA TTA CAT CCT CCT AAG ACA TTG 

II. Pro Glu Trp IU Glu tyr Glu Gin GU IU Glu Pro Leu V.1 Ser L.u v.1 Gin Asn A-n Thr Al. Ly- Glu v.l v.1 cln Clu 
4744 ATC CCA GAA TOO ATT GAA TAC GAG CAA GAA ATT GAA OCA TOC GTA TCA TTA GTA CAG AAT AAT ACC CCC AAA GAA CTT GTT CAA CAC 

II. Glu II. Asp Thr Asp Cly Clu Gin Glu Glu Cln Ly. Cln clu Ser Asn Asp Asp Asp Ph. 21. Hi. Xmp GlTl I^u stop 
4833 ATA GAA ATT GAT ACA GAC GGA GAA CAA GAA GAA CAA AAA CAA GAA ACT AAT CAT CAT CAT TTT ATT CAC GAT GAA TTG TAA TTCTCAA 
4921 AGTCACATGCAATAAATA6T6AGAACT 
503 € AGA LTaiTlTlll llCIC CtAXAXTCCGCTPT. 
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Figure 2 A. 

Vi'l l'I*3CCCCATTCACTATT 

1 

21 CAGTATTfnATATAATATATAA^^ 

136 tatattcttcattaattaaacac^^ i iA ATATTAACATi«TCC ATri rrrrrr n iacccaacctatcaaaattatttt 

351 TCTTCTCTAACAACTATAAW^^ 

Met S.r A*p ST Clu S.r Tyr Tyr Gin S.r Thr Thr A-o Cln Pro IX. Pro An, S.r A-P Clu Vel Leu A-P A-p Hi* Arg A-n 

366 ATO TCC GAT AST GAA ACT TAT TAT CAA AAT TCA ACT ACT AAT CAA CCT ATT OCT ACA TCT CAT CAA CTA W5 GAT GAT CAT AGA AAT 

Cln IX. Tht A-n Asp cy. Al* IX. s.r A-p s.r Clu A-p olu L« Glu i-u Ly. ser clu L«u clu S.r clu v.l v.l Ly. s.r olu 

453 CAA ATC ACT AAT GAT TCT CCC ATT ACT GAT ACT GAA CAT GAG TTC CAA TTA AAA TCA GAA TTA GAA TCA GAA CTT CTA AAA ACC GAA 
ly- cln Gin Cln Hi* Hi. Gin Glu IX. Thr Ser A-p A-n AX. Ly- Pro L.u Thr Arg Ly. S*r cXy Ser S.r IX. Ly- Ly- Ly. Ser 

540 AAA CAA CAA CAA CAT CAT CAA GAG ATT ACA TCA GAT AAT CCT AAA CCA TTC ACT CCT AAA TCT OCT TCT TCA ATT AAG AAA AAA TCT 

A— n Leu Thr Asp Lys Aop Arg lie Thr Aan Pro Met ser Leu Ser Cly Cly Asp A*p Thr IX. A.n Ser cly Hi. Ly. An Arg A-n 

£27 XAT CTT ACC GAT AAA GAT AGA ATT ACC AAC CCT ATO ACT TTA TCT OCT CCT GAT GAT ACT ATT AAC ACC CCT CAC AAA AAT CCT AAT 
Tyr A-n Met Ser Ser Leu Arg Ly. A-p Phe Tyr Leu Ly- A-p Asn Thr Asp Asp Ash Ser Thr A-n A-n Hi. Thr Hi. Leu Al* He 

714 TAT AAC ATG ACT TCA TTA CGT AAA GAT TTT TAT TTA AAA GAT AAT ACT GAC CAC AAT TCT ACT AAT AAT CAT ACT CAT CTT CCA ATP 
Pro lie Pro lie Pro IX. Pro Thr Pro IX. II. Thr A-n Ala A-n Ly. Ser Ary Ary Ly- Ser Gin Leu Olu A-n Leu Pro Pro Leu 

801 CCA ATT CCA ATT CCA ATT CCA ACC CCA ATT ATT ACT AAT OCT AAT AAA TCA AGA AGA AAA TCT CAA TTG GAA AAT TTA CCT CCA TTA 

lie Ly. Ly. Ly- Thr II. Gly Arg A-n A-n S.r A-n A-n Ph. clu A-n A-p Leu v.l S«r Pro Met Thr Ly- Met Ly- Thr A-n A-p 
BBS ATT AAA AAC AAA ACA ATT CCT CCT AAT AAT TCT AAT AAT TTT GAA AAT GAT TTA CTT ACT CCC ATG ACA AAA ATG AAA ACT AAT CAT 
Ser Clu Asp He Thr A-n Thr Ser Thr Thr Ale A-n Hi- Met Ly- Leu Cly IX. Cly Al* Thr Thr Leu cly V.1 Gly Thr Gly Thr 
375 ACT GAA GAT ATT ACT AAT ACT ACC ACC ACT CCT AAT CAT ATG AAA CTT CCT ATT GOT CCT ACA ACC CTT CGT CTT OCA ACT GOT ACT 
Ul Thr Al. Thr Ale Thr AX. Thr Ala AX. Ale Cly Arg Arg Pro Ser Arg S.r s.r lie Asp Ser Clu Al. A-p Ser Hi- Ale S.r Arg 

Q 1062 ACC<rcACTCCCACTCCCACTCCTCCTGCT 

M= s« T s. r cm Clu Thr Clu Clu A-p VI cy- Phe Pro Met V*l Cly A-p Hi. lie Arg V*X A-n cXy II. A-p Ph. A-p GXu IX. A-p 

\d 1149 TCA TCT CAA CAA ACT GAA CAA CAT CTT TCT TTT CCT ATG CTT CGT CAT CAT ATT AGA CTT AAT CCA ATP GAT TTT GAT GAA ATT GAT 

fV| olu Phe 11. Ary Olu Clu Arg Glu Clu Al. Tyr Leu Cln Ly- Cln Met IX. Al. Ly. A-n IX. Leu Arg II. A-p Glu Phe Gin A-n 

m 1236 GAA TTT ATT AGA GAA GAA AGA GAA GAA CCT TAT TTA CAA AAA CAA ATG ATT CCT AAA AAT ATT CTG CCT ATT CAT GAA TTT CAA AAT 

Leu ser Ly- A-n A-n Thr Thr Ser Gly Ala S.r Arg Hi- Pro Tyr Hi- Hi- Hi- Ser A-n A-n A-n Ly- Ly- A-n A-n Gly Cly A-p 
~ 1323 CTT TCC AAA AAT AAT ACT ACT ACT CCT CCA TCT CGT CAT CCA TAT CAT CAT CAC ACT AAT AAT AAT AAA AAA AAT AAT GOT GOT GAT 

c i y cly cly s . r Ser Met Ale Ale Leu Ly. Tyr Thr Pro Ly. A-n IX. Leu Ly- Lys Thr Leu Ser Arg Ph. clu Ph. Thr Hi- Olu 

L, 1410 CCT GOT CCT TCT AGT ATO CCA OCA TTA AAA TAT ACT CCA AAA AAT ATT TTA AAC AAA ACA TTA TCA AGA TTT GAA TTT ACT CAT GAA 

ri A-n Ser Ser Ser Ser Glu Glu He Tyr Clu L.u Ly- Thr Ly- Cln cln Pro Pro Tyr Ly- Tyr A-p A-p cln Leu Ser L«i Thr S.r 

£ 1497 AAT TCT TCA TCT TCA GAA GAA ATT TAT GAA TTG AAC ACT AAA CAA CAA CCA CCT TOC AAA TAT GAT GAT CAA TTA TCA TTA ACT TCA 

5 ser Thr Ser Ser Thr Ser Gly Ser Gly s.r cly cln v.l Ly. Ph. Gly Gly AX. Arg IX. Ser A-p Gly lie A-n cly Gly Ser Leu 

Uj 1584 TCT ACA TCT TCT ACT TCT GGA TCT GGA TCT CCC CAC CTG AAA TTT CGT CCA CCA AGA ATT TCT GAT GG6 ATT AAT GGA GOT TCA TTA 

*0 Pro Asp Arg Phe S«r Leu Phe Hi- Ser Glu Ser Clu Glu Thr lie Hi- Al* Pro A-p lie Pro Ser Leu v.X Ser Pro Gly Gin S.r 

■*D 1<71 CCT GAT AGA TTT TCA CTT TTC CAT TCT GAA TCA GAA GAA ACT ATT CAT CCC CCC GAT ATT OCA TCA TTA CTA TCA CCA GOT CAA TCT 

VI Arg A-p Leu Phe Arg A-n Gly Glu Glu Thr Trp Trp Leu A-p Cy- Thr Cy- Pro Thr A-p S«r Clu Met Ly. Met Leu AX. Ly- 
1758 CTT CCA GAT TTA TTT AGA AAT GCT GAA GAA ACT TGG TOG TTA GAT TCT ACT TOT CCT ACT GAT TCC GAA ATG AAA ATG TTG CCC AAA 

AX. Phe CXy IX. Hi- Pro Leu Thr Ale Glu A-p II. Arg Met Gin Glu Thr Arg Glu Ly- Vnl Glu Leu Phe Ly- Ser Tyr Tyr Phe 
1845 CCA TTT GCT ATT CAT CCT TTA ACT CCT GAA CAT ATT CGA ATG CAA GAA ACT CCT GAA AAA CTT GAA TTA TTT AAA ACT TAT TAT TTT 

V.l Cy. Ph. Hi- Thr Ph. Glu Ale A-p Ly. Glu Ser Glu A-p Tyr Leu clu Pro II. A-n vel Tyr He V.1 v.l Ph. Bi- A-p oly 
1932 GTT TOT TTC CAT ACT TTT GAA GCT GAT AAA GAA TCT CAA GAT TAT TTA GAA CCC ATA AAT CTT TAT ATT CTT CTT TTC CAT GAT GCT 

He Leu Thr Phe Hi. Ph. Ser Pro II. S.r Hi- Pro Al. A-n V.1 Arg Arg Arg v.X Arg Cln Leu Arg A-p Tyr v.l A-p v.l s.r 

2019 ATA TTA ACC TTC CAT TTT TCA CCA ATT TCT CAT CCA OCA AAT GTT ACA ACA ACA GTT CGT CAA TTG ACA GAT TAT CTC GAT CTT AGT 
Al. A-p Trp Leu cy- Tyr Al. Leu IX. A-p Olu lie Thr A-p Oly Phe Al. Pro vel He Hi- cly lie Glu Tyr clu Al. A-p Al. 

2106 GCT GAT TOO TTA TOT TAT GCC TTA ATC GAT GAA ATT ACC GAT OCT TTT CCC CCC GTG ATT CAT GGA ATT GAA TAT CAA OCT GAT GCC 
II. Glu A— p Ale Vel Ph. Thr Ale Arg A— p Thr A— p Phe S.r Ser M.t Leu Cln Arg He Cly Glu s.r Arg Arg Ly. v.l Met Thr 

2193 ATT GAA GAT GCC CTT TTC ACT GCT AGA GAT ACT GAT TTT AGT AGT ATG TTA CAA AGA ATT OCT GAA TCA AGA ACA AAA CTC ATG ACT 
L.u Met Arg Leu Leu Ser Gly Ly. Al* A-p v.l II. Ly- M-t Ph. Al. Ly. Arg Cy. Oln Glu Clu Ale A»n S.r Ser Ser Cly Tyr 

2280 TTA ATG AGA ^"A TTA TCA GCT AAA OCT GAT CTC ATT AAA ATG TTP CCT AAA AGA TCT CAA GAA GAA OCT AAT TCT TCT TCT GOT TAT 

Tyr Gin Arg Gin Tyr Asn Leu Gin Gin Gin Gin Cln Gin Al. Pro Pro Pro Pro Pro Ann Pro He H. Thr Ser Pro He A*n Ser 
2367 TAT CAA CGT CAA TAT AAC TTA CAA CAA CAA CAA CAA CAC CCC CCA CCA CCA CCA CCT AAT CCT ATT ATT ACT TCA CCA ATT AAT TCA 
Thr Leu A-n Leu A-n Ser Leu Cly Thr Ser Thr Cly Cly Cly Vel Cly Vel Cly Cly He A-n Ph. Gly Pro A-n Pro Thr Gly A-n 
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2454 ACT TPS AAT CTT AAT ACT TTA CCA ACT TCA ACT GCT OCA CCA CTA CCA CTA CCA CCA ATT AAT TTT OCT CCC AAT CCA ACT OCA AAT 

Asn Thr Asn Thr Aan Thr Asn Thr Thr Cly S«r Pro Sar Pro Pro cln Cln Cln cln Cln Bis Cly Ha Ttix Asa Lys smr Pho Pro 
2541 AAT ACT AAT ACT AAT ACT AAT ACT ACT OCT TCA CCT TCA CCA CCT CAA CAA CAA CAA CAA CAT OCT ATC ACT AAC AAA TCT TTC CCC 

Urn Pro Asp Ala Arg Pro Arg Ala Asp He Ala Lou Tyr Leu Gly Asp Ha Cln Aap His U« Ha Thr n*t Pha cln Aan Lau Lau 
2628 ATC CCC GAT CCA COT OCA ACA CCT GAT ATT OCA TTA TAT TTA OCT CAT ATT CAA CAT CAT ATA ATC ACC ATC TTT CAA AAT TTA TTA 

Ala Tyr olu Lya lie Pha 5ar Arg s« Hi* S.r Asn Tyr L«i Ala cln Lau Cln Val clu Sar Pho Aan Bar Asn Aan Lya He Thr 
2715 GCC TAT GAA AAA ATT TTC ACT CCT TCA CAT TCA AAT TAT TTA GCT CAA TTA CAA CTT GAA TCA TTC AAT TCC AAT AAT AAA ATC ACC 

Clu Hat Pho Sor Lys Xla Thx L-u Ha Oly Thr K-t L.u VI Pro L-u Am too Vol Tfax Oly tea Ph. Oly M.t Aon Vsl Arg V«l 
2802 ATS WT TCT AAA ATT ACT TTG ATT QGO ACA ATS TTA CTT CCA TTA AAT TTA GTC ACC GCA ffPT TTT OCT ATC AAT CTA AGA CTC 

Pro Gly Glu Cly Gly Thr Ami Leu oly ftp Pho Ph. Oly lla Tal oly Vol tou He Ph. Ho Ho Zlo Oly S«r pha Ha Pho Ala 
2869 CCT GGT GAA CCT OGT ACC AAT TTA GCT TOG TTT TTC CGA ATT CTT CGA CTA TTA ATA TTT ATA ATT ATT OGA TCA TTT ATA TTT GCT 

Gin Trp ftp Lou Lya Lys Leu Aan Aan Sar Ha Glu Oly Gin Aan Aan Gly Asn Arg Pro Ha Pho Asn His Sar Sar Arg Arg Sar 
2976 CAA TOG TGG TTG AAA AAA TTC AAT AAT TCA ATT GAA GGA CAA AAT AAT CCT AAT CGA CCA ATT TTT AAT CAT TCA TCA AGA AGA TCA 

lis Arg sor Lau oly Lau Lya Lya His Gly Cly Asn Lys Sor Ha Ha S«r Phe Pro Aan Lya Tyr clu stop 
3063 ATT ACA ACT TTA GGT TTA AAA AAA CAT OCT OCT AAT AAA TCA ATT ATT ACT TTC CCC AAT AAA TAT CAA TAA CAATAATCAAACAAATGCC 
3 154 ACAGAGTTTCATGGTTTCTTTTTTTTTT 

32«9 "»™taattctatataatcgtatactaactt^ 

3384 ACGATCTAAAAGAAGTTTTTAAAGAAGW 
3499 TTTTCAAATAAAATATAAGTTTATCTAAATO^ 

3614 ^^^AAATAAACTl"PCAAAAO<AATCAGTAAATTCCCTTATATATT)Ql^rCC 
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Figure 3A. 
1 

62 CTAATTCAACrTCTTCTACTm 
177 ACGAACACCAAACACACGAACAAAAAAAAAATTCCAAC^^ 
292 TTACTCATAAACTACTTOCTCATATCTTOCTCT^ 

Met Clu Bis Pro Pro Ala Ala Lou Ary Thr Phe S«r Thr Cln Ser Thr Ser Ser L«u Aan Ser Val Ser Thr Vol Ser Sex Ser Ary 
407 ATC CAA CAT CCA CCA CCA OCT CTC AGA ACA TTT TCA ACC CAA TCA ACT TCA TCT TTC AAT TCA GTA ACT ACT CTT TCC TCT TCA ACA 
lie Val Ser Lou Cly Pro Val Aan lit Asa Aan Phe Aan Lya Pro S«r Thr Pro Lys Ajrp Hi a Leu Phe Tyr Ary Cya Glu Ser Leu 
494 ATT CTT TCT CTC CGC CCA CTC AAT ATA AAC AAT TTC AAT AAA CCA ACT ACT CCC AAA GAC CAT TTA TTC TAT CCA TCT CAA TCA CTA 
Lya Ary Lye Leu Cln Lya lie Pro Cly Met Glu Pro Phe Leu Aan Cln Ala Phe Aan Gin Ala Glu Cln Lou Ser Glu Gin Gin Ala 
581 AAA CCA AAA CTA CAA AAA ATC OCT CCC ATC GAA CCA TTT TTG AAC CAA CCT TTC AAT CAC GCT GAA CAA CTC ACT CAA CAA CAA CCA 
Leu Ala Lou Ala Gin Glu Ary Ser Aan Cly Aan Gly Hia Ser Aan Cly Lya Ary His Gin Ser Leu Asp Gly Ala Met Aan Ary Leu 
£68 TTG GCT TTG GCA CAG GAA AGA ACC AAT GGA AAT CCA CAT ACT AAT CCC AAA CCT CAT CAA TCA TTA GAC CCT GCC ATC AAT ACA CTT 
Sor Val Gly Ser Aap Ser Ser Ser lie Gin Gly Ser Lou Thr Ary Met Ala Thr Aan Ala Ser Thr Ser Ser Leu lie Ser Cly Hot 
755 TCA CTT GCT TCT GAT ACT ACT TCG ATT CAA CCT TCA TTG ACA CCA ATC GCT ACC AAT GCG TCA ACC TCA TCT TTA ATC ACT GCT ATC 
Pro Aan Ser Aan Thr Lou Phe Thr Phe Thr Ala Gly Val Leu Pro Ale Aan He Ser Vel Aap Pro Ale Thr Hla Leu Trp Lya Lou 
842 CCA AAC ACC AAC ACT TTA TTT ACC TTT ACT GCA GCC CTT TTA CCA CCT AAT ATT ACT CTC GAT CCT GCT ACC CAT CTT TCG AAA TTG 
Phe Gin Cln Cly Ala Pro Phe cya Val Leu lie Aan Hia He Leu Pro Aap Ser Cln He Pro Val Val Ser Ser Aap Aap Leu Ary 
929 TTC CAA CAA GGC GCC CCC TTT TCT CTT CTT ATC AAT CAT ATC CTT CCT GAT TCC CAA ATA OCA CTT CTC ACT TCT GAT GAC TTG AGA 

He cya Lya Lya Ser Val Tyr Aap Phe Leu He Ala Val Lya Thr Cln Leu Aan Phe Aap Aap Clu Aan Ket Phe Thr He Ser Aan 
1016 ATT TCC AAA AAA TCA GTA TAT CAC TTT TTA ATT GCC CTC AAC ACA CAA TTC AAT TTT CAT GAT CAG AAT ATC TTC ACT ATA TCC AAT 
Val Phe Ser Aap Aan Alo Cln Aap Leu Ho Lya Ho He Aap Vol Ho Aan Lya Leu Leu Ala Glu Tyr Ser Aap Ale Ser Asp Ubu 
G 1103 CTT TTC TCC CAC AAT CCC CAA GAT TTA ATC AAG ATT ATT CAT CTC ATT AAT AAA CTA CTT CCT CAG TAC TCA CAT CCT ACT CAC CTC 

M 1 Gly Gly Gly Aap Glu Aap Val Aan Met Aap Val Gin He Thr Aap Clu Ary Ser Lya Val Phe Ary Clu He He Glu Thr Clu Ary 

lil H90 GGT GCT GCC GAT GAA GAT CTA AAT ATC GAT CTT CAA ATT ACC GAT GAA AGA TCA AAA CTT TTC CCA GAA ATT ATC GAA ACA GAA ACA 

flj Lya Tyr Val Gin Aap Leu Glu Lou Met Cya Lya Tyr Ary Cln Aap Leu He Glu Ala Glu Aan Leu Ser Sor Glu Gin Ho Bis Leu 

HQ 1277 AAA TAT CTT CAA GAC TTC GAA CTA ATG TCT AAA TAC COT CAA GAT CTA ATT GAA GCC GAA AAT TTG TCT TCA CAA CAA ATT CAC TTG 

V| L«u phe Pro Aan Leu Aan Glu He He Aap Phe Cln Ary Arg Phe Leu Aan Gly Leu Glu Cya Aan He Aan Val Pro He Ary Tyr 

1364 TTA TTC CCA AAT TTA AAT GAC ATT ATT GAT TTT CAA AGA CCA TTC CTC AAT GGC TTA GAA TCT AAC ATC AAT GTA CCT ATT AGA TAT 
Gin Ary Ho Gly Ser Val Phe Ho Bis Ala Ser Leu Gly Pro Pbo Aan Ala Tyr Glu Pro Trp Thr Ho Gly Gin Leu Thr Ale He 
1451 CAA AGA ATT GGA TCA GTA TTT ATT CAT CCT TCT TTG CGC CCT TTC AAT CCT TAT CAA CCT TCG ACT ATA GGA CAA TTG ACQ GCG ATT 
Aap Leu He Aan Lya Clu Ala Ala Aan Leu Lya Lya Ser Ser Ser Leu Leu Aap Pro Cly Phe Glu Leu Cln Ser Tyr Ho Leu Lya 
1538 CAT TTC ATC AAC AAA CAA CCT CCT AAT TTC AAA AAA TCG TCA ACT CTA CTT GAT CCT CGC TTT CAA CTT CAA TCG TAT ATA TTA AAG 

Pro lie Cln Arg Leu cys Lya Tyr Pro Leu Leu Leu Lys Glu Leu He Lya Thr Ser Pro Glu Tyr ser Lya Gin Aap Pro Bis Gly 
1625 CCG ATC CAA AGA TTG TCT AAA TAC CCA CTT TTG TTG AAA GAG TTA ATC AAA ACA TCA CCA CAA TAT TCA AAA CAC CAC CCC CAT CCC 
Ser Ser Ser Ser Thr Sor Phe Aan Glu Leu Leu Val Ala Lys Thr Ale Mot Lys Glu Leu Alo Aan Gin Val Aan Glu Ala Gin Ary 
1712 AGC TCG TCA TCG ACA TCA TTC AAT GAA TTA TTG GTQ GCT AAA ACT CCA ATC AAA GAA TTG GCA AAT CAA CTC AAT GAG GCC CAA AGA 
Ary Ala Clu Aan He Clu Bia Leu clu Lya Leu Lya Glu Ary Val Cly Aan Trp Ary Gly Phe Aan Leu Asp Ala Gin Gly Glu Leu 

1799 CCA GCA CAA AAT ATC GAA CAT TTC GAA AAA CTA AAA GAA ACA CTA OCT AAT TCC CCT COG TTT AAT TTC CAT OCT CAA GGA GAA CTA 

Leu Phe Bis Gly cln val Gly val Lys Asp Ala Glu Aon Clu Lya clu Tyr val Ala Tyr Leu Phe clu Lys He Val phe phe Phe 
1886 TTA TTC CAC CCA CAA CTT CCG CTT AAA CAT GCT CAA AAT GAA AAG CAA TAC CTT GCT TAT CTT TTT GAA AAA ATC GTA TTT TTT TTC 

Thr Clu He Aap Asp Aan Lya Lya Sor Aap Lya Gin Glu Lya Lya Ser Lya Phe Ser Thr Ary 
1973 ACA GAA ATT GAT GAT AAC AAA AAA TCT GAT AAA CAG GAA AAG AAG AGC AAG TTT TCG ACA AGA AAG 
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